Harry Potter Text Generator

Grab Harry Potter Text


In [2]:
# http://www.glozman.com/textpages.html
    
# Harry Potter 1 - Sorcerer's Stone.txt
# Harry Potter 2 - Chamber of Secrets.txt
# Harry Potter 3 - The Prisoner Of Azkaban.txt
# Harry Potter 4 - The Goblet Of Fire.txt
# Harry Potter 5 - Order of the Phoenix.txt
# Harry Potter 6 - The Half Blood Prince.txt
# Harry Potter 7 - Deathly Hollows.txt

In [3]:
with open("texts/HarryPotter1-SorcerersStone.txt", "r") as f:
    text = f.read().lower()

In [4]:
chars = sorted(list(set(text)))
char_indices = dict((c, i) for i, c in enumerate(chars))
indices_char = dict((i, c) for i, c in enumerate(chars))
'corpus length: {}  total chars: {}'.format(len(text), len(chars))


Out[4]:
'corpus length: 442745  total chars: 54'

In [5]:
print(text[:100])


harry potter and the sorcerer's stone 

chapter one 

the boy who lived 

mr. and mrs. dursley, of n

Create the Training set

Build a training and test dataset. Take 40 characters and then save the 41st character. We will teach the model that a certain 40 char sequence should generate the 41st char. Use a step size of 3 so there is overlap in the training set and we get a lot more 40/41 samples.


In [6]:
maxlen = 40
step = 3
sentences = []
next_chars = []

for i in range(0, len(text) - maxlen, step):
    sentences.append(text[i: i+maxlen])
    next_chars.append(text[i + maxlen])
    
print("sequences: ", len(sentences))


sequences:  147569

In [7]:
print(sentences[0])
print(sentences[1])


harry potter and the sorcerer's stone 


ry potter and the sorcerer's stone 

cha

In [8]:
print(next_chars[0])


c

One-hot encode


In [9]:
import numpy as np

X = np.zeros((len(sentences), maxlen, len(chars)), dtype=np.bool)
y = np.zeros((len(sentences), len(chars)), dtype=np.bool)
for i, sentence in enumerate(sentences):
    for t, char in enumerate(sentence):
        X[i, t, char_indices[char]] = 1
    y[i, char_indices[next_chars[i]]] = 1

Create the Model


In [10]:
from keras.models import Sequential
from keras.layers import Dense, Activation
from keras.layers import LSTM
from keras.optimizers import RMSprop

model = Sequential()
model.add(LSTM(256, recurrent_dropout=0.0, input_shape=(maxlen, len(chars)), return_sequences=True))
model.add(LSTM(256, recurrent_dropout=0.0, input_shape=(maxlen, len(chars)), return_sequences=True))
model.add(LSTM(256, recurrent_dropout=0.0,  input_shape=(maxlen, len(chars))))
model.add(Dense(2*len(chars)))
model.add(Dense(len(chars)))
model.add(Activation('softmax'))
optimizer = RMSprop()
model.compile(loss='categorical_crossentropy', optimizer=optimizer)

model.summary()


Model: "sequential_2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
lstm_1 (LSTM)                (None, 40, 256)           318464    
_________________________________________________________________
lstm_2 (LSTM)                (None, 40, 256)           525312    
_________________________________________________________________
lstm_3 (LSTM)                (None, 256)               525312    
_________________________________________________________________
dense_1 (Dense)              (None, 108)               27756     
_________________________________________________________________
dense_2 (Dense)              (None, 54)                5886      
_________________________________________________________________
activation_1 (Activation)    (None, 54)                0         
=================================================================
Total params: 1,402,730
Trainable params: 1,402,730
Non-trainable params: 0
_________________________________________________________________

Train the Model


In [54]:
epochs = 100
batch_size = 512

model.fit(X, y, batch_size=batch_size, epochs=epochs)


Epoch 1/100
147569/147569 [==============================] - 35s 240us/step - loss: 2.8068
Epoch 2/100
147569/147569 [==============================] - 34s 231us/step - loss: 2.1695
Epoch 3/100
147569/147569 [==============================] - 34s 231us/step - loss: 1.8816
Epoch 4/100
147569/147569 [==============================] - 34s 232us/step - loss: 1.6884
Epoch 5/100
147569/147569 [==============================] - 34s 232us/step - loss: 1.5492
Epoch 6/100
147569/147569 [==============================] - 34s 232us/step - loss: 1.4441
Epoch 7/100
147569/147569 [==============================] - 34s 231us/step - loss: 1.3570
Epoch 8/100
147569/147569 [==============================] - 34s 232us/step - loss: 1.2845
Epoch 9/100
147569/147569 [==============================] - 34s 231us/step - loss: 1.2135
Epoch 10/100
147569/147569 [==============================] - 34s 231us/step - loss: 1.1454
Epoch 11/100
147569/147569 [==============================] - 34s 231us/step - loss: 1.0750
Epoch 12/100
147569/147569 [==============================] - 34s 232us/step - loss: 1.0077
Epoch 13/100
147569/147569 [==============================] - 34s 233us/step - loss: 0.9373
Epoch 14/100
147569/147569 [==============================] - 34s 231us/step - loss: 0.8653
Epoch 15/100
147569/147569 [==============================] - 34s 231us/step - loss: 0.7913
Epoch 16/100
147569/147569 [==============================] - 34s 232us/step - loss: 0.7167
Epoch 17/100
147569/147569 [==============================] - 34s 232us/step - loss: 0.6445
Epoch 18/100
147569/147569 [==============================] - 34s 231us/step - loss: 0.5714
Epoch 19/100
147569/147569 [==============================] - 34s 231us/step - loss: 0.5034
Epoch 20/100
147569/147569 [==============================] - 34s 232us/step - loss: 0.4400
Epoch 21/100
147569/147569 [==============================] - 34s 233us/step - loss: 0.3816
Epoch 22/100
147569/147569 [==============================] - 34s 232us/step - loss: 0.3327
Epoch 23/100
147569/147569 [==============================] - 34s 231us/step - loss: 0.2876
Epoch 24/100
147569/147569 [==============================] - 34s 233us/step - loss: 0.2528
Epoch 25/100
147569/147569 [==============================] - 34s 232us/step - loss: 0.2246
Epoch 26/100
147569/147569 [==============================] - 34s 233us/step - loss: 0.2020
Epoch 27/100
147569/147569 [==============================] - 34s 232us/step - loss: 0.1827
Epoch 28/100
147569/147569 [==============================] - 35s 234us/step - loss: 0.1687
Epoch 29/100
147569/147569 [==============================] - 35s 234us/step - loss: 0.1581
Epoch 30/100
147569/147569 [==============================] - 34s 232us/step - loss: 0.1483
Epoch 31/100
147569/147569 [==============================] - 34s 232us/step - loss: 0.1406
Epoch 32/100
147569/147569 [==============================] - 34s 233us/step - loss: 0.1343
Epoch 33/100
147569/147569 [==============================] - 35s 234us/step - loss: 0.1289
Epoch 34/100
147569/147569 [==============================] - 34s 234us/step - loss: 0.1250
Epoch 35/100
147569/147569 [==============================] - 34s 234us/step - loss: 0.1207
Epoch 36/100
147569/147569 [==============================] - 35s 234us/step - loss: 0.1168
Epoch 37/100
147569/147569 [==============================] - 34s 234us/step - loss: 0.1138
Epoch 38/100
147569/147569 [==============================] - 34s 233us/step - loss: 0.1094
Epoch 39/100
147569/147569 [==============================] - 34s 232us/step - loss: 0.1096
Epoch 40/100
147569/147569 [==============================] - 35s 238us/step - loss: 0.1056
Epoch 41/100
147569/147569 [==============================] - 35s 234us/step - loss: 0.1015
Epoch 42/100
147569/147569 [==============================] - 35s 235us/step - loss: 0.1019
Epoch 43/100
147569/147569 [==============================] - 35s 235us/step - loss: 0.0987
Epoch 44/100
147569/147569 [==============================] - 35s 235us/step - loss: 0.0965
Epoch 45/100
147569/147569 [==============================] - 35s 235us/step - loss: 0.0958
Epoch 46/100
147569/147569 [==============================] - 34s 234us/step - loss: 0.0924
Epoch 47/100
147569/147569 [==============================] - 35s 234us/step - loss: 0.0923
Epoch 48/100
147569/147569 [==============================] - 35s 236us/step - loss: 0.0898
Epoch 49/100
147569/147569 [==============================] - 34s 234us/step - loss: 0.0893
Epoch 50/100
147569/147569 [==============================] - 34s 233us/step - loss: 0.0847
Epoch 51/100
147569/147569 [==============================] - 34s 233us/step - loss: 0.0856
Epoch 52/100
147569/147569 [==============================] - 35s 235us/step - loss: 0.0847
Epoch 53/100
147569/147569 [==============================] - 34s 233us/step - loss: 0.0831
Epoch 54/100
147569/147569 [==============================] - 34s 233us/step - loss: 0.0824
Epoch 55/100
147569/147569 [==============================] - 35s 235us/step - loss: 0.0809
Epoch 56/100
147569/147569 [==============================] - 35s 234us/step - loss: 0.0815
Epoch 57/100
147569/147569 [==============================] - 34s 233us/step - loss: 0.0810
Epoch 58/100
147569/147569 [==============================] - 34s 233us/step - loss: 0.0781
Epoch 59/100
147569/147569 [==============================] - 34s 232us/step - loss: 0.0763
Epoch 60/100
147569/147569 [==============================] - 34s 233us/step - loss: 0.0791
Epoch 61/100
147569/147569 [==============================] - 34s 233us/step - loss: 0.0770
Epoch 62/100
147569/147569 [==============================] - 34s 233us/step - loss: 0.0754
Epoch 63/100
147569/147569 [==============================] - 34s 233us/step - loss: 0.0746
Epoch 64/100
147569/147569 [==============================] - 34s 234us/step - loss: 0.0745
Epoch 65/100
147569/147569 [==============================] - 34s 233us/step - loss: 0.0743
Epoch 66/100
147569/147569 [==============================] - 34s 234us/step - loss: 0.0715
Epoch 67/100
147569/147569 [==============================] - 34s 233us/step - loss: 0.0737
Epoch 68/100
147569/147569 [==============================] - 35s 234us/step - loss: 0.0728
Epoch 69/100
147569/147569 [==============================] - 35s 235us/step - loss: 0.0693
Epoch 70/100
147569/147569 [==============================] - 34s 233us/step - loss: 0.0697
Epoch 71/100
147569/147569 [==============================] - 34s 233us/step - loss: 0.0694
Epoch 72/100
147569/147569 [==============================] - 35s 236us/step - loss: 0.0703
Epoch 73/100
147569/147569 [==============================] - 34s 232us/step - loss: 0.0682
Epoch 74/100
147569/147569 [==============================] - 34s 232us/step - loss: 0.0662
Epoch 75/100
147569/147569 [==============================] - 34s 231us/step - loss: 0.0670
Epoch 76/100
147569/147569 [==============================] - 34s 233us/step - loss: 0.0684
Epoch 77/100
147569/147569 [==============================] - 34s 233us/step - loss: 0.0648
Epoch 78/100
147569/147569 [==============================] - 34s 232us/step - loss: 0.0650
Epoch 79/100
147569/147569 [==============================] - 34s 233us/step - loss: 0.0658
Epoch 80/100
147569/147569 [==============================] - 34s 233us/step - loss: 0.0646
Epoch 81/100
147569/147569 [==============================] - 34s 232us/step - loss: 0.0643
Epoch 82/100
147569/147569 [==============================] - 34s 232us/step - loss: 0.0651
Epoch 83/100
147569/147569 [==============================] - 34s 231us/step - loss: 0.0644
Epoch 84/100
147569/147569 [==============================] - 34s 233us/step - loss: 0.0637
Epoch 85/100
147569/147569 [==============================] - 34s 233us/step - loss: 0.0626
Epoch 86/100
147569/147569 [==============================] - 34s 232us/step - loss: 0.0622
Epoch 87/100
147569/147569 [==============================] - 34s 232us/step - loss: 0.0633
Epoch 88/100
147569/147569 [==============================] - 34s 233us/step - loss: 0.0607
Epoch 89/100
147569/147569 [==============================] - 34s 233us/step - loss: 0.0609
Epoch 90/100
147569/147569 [==============================] - 34s 232us/step - loss: 0.0606
Epoch 91/100
147569/147569 [==============================] - 34s 233us/step - loss: 0.0573
Epoch 92/100
147569/147569 [==============================] - 35s 235us/step - loss: 0.0596
Epoch 93/100
147569/147569 [==============================] - 35s 236us/step - loss: 0.0589
Epoch 94/100
147569/147569 [==============================] - 34s 233us/step - loss: 0.0601
Epoch 95/100
147569/147569 [==============================] - 34s 232us/step - loss: 0.0581
Epoch 96/100
147569/147569 [==============================] - 34s 233us/step - loss: 0.0568
Epoch 97/100
147569/147569 [==============================] - 35s 236us/step - loss: 0.0584
Epoch 98/100
147569/147569 [==============================] - 34s 233us/step - loss: 0.0594
Epoch 99/100
147569/147569 [==============================] - 34s 232us/step - loss: 0.0586
Epoch 100/100
147569/147569 [==============================] - 35s 236us/step - loss: 0.0583
Out[54]:
<keras.callbacks.callbacks.History at 0x1f08a3d7a90>

In [55]:
# model.save_weights("potter_lstm_weights_0568.h5")

Generate new sequence


In [11]:
model.load_weights("potter_lstm_weights_0568.h5")

In [12]:
import random

def sample(preds, temperature=1.0):
    preds = np.asarray(preds).astype('float64')
    preds = np.log(preds) / temperature
    exp_preds = np.exp(preds)
    preds = exp_preds / np.sum(exp_preds)
    probas = np.random.multinomial(1, preds, 1)
    return np.argmax(probas)

In [13]:
import sys
start_index = random.randint(0, len(text) - maxlen - 1)
for diversity in [0.2, 0.5, 1.0]:
    print()
    print('----- diversity:', diversity)
    generated = ''
    sentence = text[start_index: start_index + maxlen]
    generated += sentence
    print('----- Generating with seed: "' + sentence + '"')
    sys.stdout.write(generated)
    for i in range(400):
        x = np.zeros((1, maxlen, len(chars)))
        for t, char in enumerate(sentence):
            x[0, t, char_indices[char]] = 1.
        preds = model.predict(x, verbose=0)[0]
        next_index = sample(preds, diversity)
        next_char = indices_char[next_index]
        generated += next_char
        sentence = sentence[1:] + next_char
        sys.stdout.write(next_char)
        sys.stdout.flush()
    print()


----- diversity: 0.2
----- Generating with seed: "s stupid, fat rat yellow." 

he waved hi"
s stupid, fat rat yellow." 

he waved his wifd, forg
c:\users\mcama\appdata\local\programs\python\python36\lib\site-packages\ipykernel_launcher.py:5: RuntimeWarning: divide by zero encountered in log
  """
otte robblows and petunia field what he mied on its mind. the carent see than all day. it was hermione near thas he wooded mountain troll. and mone. o'd all the hat if i cold him to his face." 

"look done this me somethin' all sitten yer muties." 

he said, "ei'll can to expections chisers. ust past whise-here. it was only anyone found off his mother, they're not all in be a sholl tim

----- diversity: 0.5
----- Generating with seed: "s stupid, fat rat yellow." 

he waved hi"
s stupid, fat rat yellow." 

he waved his wifd, because they weren't starting and there of from the mirror of erised quietly, bown here his lapped in a wazm. "he are yound mation in the school, "i'm said! ron and hermione. 

harry put his lang stopper meseley twiffing out with his really. he took a low ron dripped it fride mirs charlew hermione waited hather had even senkoned out of earing untious footstoppisdowed pleated on the ribbsar

----- diversity: 1.0
----- Generating with seed: "s stupid, fat rat yellow." 

he waved hi"
s stupid, fat rat yellow." 

he waved his wird behind the first years feele as he could. 

"you two we've got to come, it," he said forward. "than is -- all yelled your father arrivy yeh on, i was doing out what he's spind, too. it was a sistack start chiven's copparte cloak, he knew that gient stuck in his eyes on the door. 

"well vernon, his if it was doing," he said suddenly and schoollegs and school riding stepped on the dangest as

In [ ]: